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Abstract 

Granulocyte-colony stimulating factor (G-CSF) is used worldwide to prevent neutropenia caused by high- 
dose chemotherapy. It has limited stability, strict formulation and storage requirements, and because of poor 
oral absorption must be administered by injection (typically daily). Thus, there is significant interest in 
developing analogs with improved pharmacological properties. We used our ultrahigh throughput compu- 
tational screening method to improve the physicochemical characteristics of G-CSF. Improving these 
properties can make a molecule more robust, enhance its shelf life, or make it more amenable to alternate 
delivery systems and formulations. It can also affect clinically important features such as pharmacokinetics. 
Residues in the buried core were selected for optimization to minimize changes to the surface, thereby 
maintaining the active site and limiting the designed protein's potential for antigenicity. Using a structure 
that was homology modeled from bovine G-CSF, core designs of 25-34 residues were completed, corre- 
sponding to 10 21 -10 28 sequences screened. The optimal sequence from each design was selected for 
biophysical characterization and experimental testing; each had 10-14 mutations. The designed proteins 
showed enhanced thermal stabilities of up to 13°C, displayed five- to 10-fold improvements in shelf life, and 
were biologically active in cell proliferation assays and in a neutropenic mouse model. Pharmacokinetic 
studies in monkeys showed that subcutaneous injection of the designed analogs results in greater systemic 
exposure, probably attributable to improved absorption from the subcutaneous compartment. These results 
show that our computational method can be used to develop improved pharmaceuticals and illustrate its 
utility as a powerful protein design tool. 
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Many techniques have been used in the design of new and 
improved proteins. In vitro directed evolution methods such 
as phage display, DNA shuffling, and error-prone PCR are 
widely used. Rational design approaches continue to be ap- 
plied, and strategies that combine both are now being used. 
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Successful designs include enzymes (Chen and Arnold 
1991; Stemmer 1994; Zhao et al. 1998) and other proteins 
(Crameri et al. 1996), as well as therapeutically useful pro- 
teins such as hormones and cytokines (Lowman and Wells 
1993; Heikoop et al. 1997; Grossmann et al. 1998; Chang et 
al. 1999). The experimental techniques involve the genera- 
tion and screening of libraries of random protein sequences. 
However, the number of sequences that can be screened ex- 
perimentally is limited (about I0 14 for library panning and 10 7 
for high throughput screening). Libraries of this size allow for 
the simultaneous modification of only about 10 residues. 
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Computational methods have also been used that perform 
in silico screening of protein sequences (Hellinga and 
Richards 1994; Desjarlais and Handel 1995; Dahiyat and 
Mayo J 996, 1997a; Street and Mayo 1999; Jiang et al. 2000; 
Kraemer-Pecore et al. 2001; Pokala and Handel 2001). Ex- 
ploiting the efficiency and speed of computers, these meth- 
ods can randomly screen a vast number of sequences (up to 
10 80 ), allowing for the simultaneous consideration and 
modification of more than 60 residues. Searching such large 
sequence spaces drastically improves the possibility of find- 
ing novel protein sequences with improved properties. 

Investigators have recently developed a computational 
screening method that finds the optimal sequence for a de- 
fined three-dimensional structure, allowing all or part of the 
sequence to change (Dahiyat and Mayo 1996). This method, 
termed Protein Design Automation (PDA), scores the fit of 
sequences to the three-dimensional structure using physical- 
chemical potential functions that model the energetic inter- 
actions of protein atoms, including steric, solvation, and 
electrostatic interactions. PDA couples these potential func- 
tions with a highly efficient search algorithm to accurately 
screen up to 10 80 sequences. Because the screening is per- 
formed in silico, multiple simultaneous mutations can be 
made, and novel sequences that are very different from wild 
type can be discovered. The method has been validated by 
numerous experimental tests and has resulted in the design 
of new proteins with improved stability and conformational 
specificity, and novel activity (Dahiyat and Mayo 1996, 
1997a; Malakauskas and Mayo 1998; Strop and Mayo 1999; 
Shimaoka et al. 2000; Bolon and Mayo 2001; Marshall and 
Mayo 2001). 

PDA also has the advantage of being able to control the 
location and type of mutations. For example, the design can 
be limited to the hydrophobic core. Mutations in the core 
can produce significant improvements in protein stability 
but do not change binding epitopes on the surface of the 
molecule. Thus, the molecular surface can be kept identical 
to the native structure, retaining biological activity and lim- 
iting toxicity and antigenicity. This feature is particularly 
important in the design of therapeutic proteins. 

We wanted to take advantage of these features of PDA 
and explore its utility in the design of improved pharma- 
ceuticals. We therefore used PDA as an ultrahigh through- 
put screen for improved analogs of a therapeutic protein, 
granulocyte-colony stimulating factor (G-CSF). G-CSF is a 
hematopoietic growth factor of 174 residues that induces 
differentiation and proliferation of granulocyte-committed 
progenitor cells. It is used clinically to treat cancer patients 
and alleviate the neutropenia induced by high-dose chemo- 
therapy. G-CSF belongs to the class of long-chain four- 
helix bundle cytokines that bind asymmetrically to homodi- 
meric complexes of cell-surface receptors to initiate an in- 
tracellular signaling cascade. Their structural similarity 
allows the design strategy chosen for G-CSF to be imme- 



diately applicable to the other four-helix bundle cytokines 
(human growth hormone, erythropoietin, the interleukins, 
and interferon-ot/p — all clinically important compounds) 
and thus broadens the potential impact of the results. 

Although the cytokines are functionally very efficacious, 
their pharmacological properties are not ideal. For example, 
G-CSF, like most proteins, is not absorbed orally to any 
significant extent and must be administered by frequent 
(daily) injections throughout the course of treatment. It also 
has limited stability and strict formulation and storage re- 
quirements, including the need to be kept refrigerated. Thus, 
there is significant interest in developing analogs with im- 
proved pharmacological properties. 

We sought to use PDA to improve the physicochemical 
characteristics of G-CSF. Improving these properties can 
make a molecule more robust, enhance its shelf life, or 
make it more amenable to use in alternate delivery systems 
and formulations. It can also affect clinically important fea- 
tures such as pharmacokinetics and result in a drug that is 
safer for human use. Our design strategy was to optimize the 
core to improve the stability and solution properties of 
G-CSF while preserving receptor binding and biological 
activity. 

The template structure used for in silico screening was a 
homology model of human G-CSF in which the human 
sequence was mapped onto bovine G-CSF. We designed 
several novel core sequences, cloned and expressed them, 
characterized their stabilities, tested them for functional ac- 
tivity both in vitro and in vivo, and studied their pharma- 
cokinetics in monkeys. The designed proteins showed en- 
hanced thermal stabilities, displayed five- to 10-fold im- 
provements in shelf life, and were biologically active both 
in cell proliferation assays and in a neutropenic mouse 
model. Subcutaneous injection of the most stable variant in 
monkeys also resulted in greater systemic exposure, prob- 
ably attributable to improved absorption from the subcuta- 
neous compartment. These results indicate that PDA has 
great potential as a powerful in silico tool in the design of 
improved pharmaceutical proteins. 

Results and Discussion 

Homology modeling 

The crystal structure of bovine G-CSF (PDB record lbgc) 
(Lovejoy et al. 1993) was used as the starting point for 
modeling because the crystal structure of human G-CSF 
(PDB record lrhg) (Hill et al. 1993) is at a lower resolution 
and is missing key fragments, including a structurally im- 
portant disulfide bond between positions 64 and 74. Bovine 
G-CSF is a good model for human G-CSF because the 
sequences are the same length and 142 of 174 amino acids 
are identical (82%). The residues that differ in the bovine 
sequence were replaced with the human residues for those 
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positions, and the conformations of the replaced side chains 
were optimized using PDA. Most of the replaced residues 
were solvent exposed, thereby introducing little strain into 
the structure and allowing typical PDA parameters to be 
used for conformation optimization. One substitution, how- 
ever, was at a buried site, G167V, and clashed sterically 
with a nearby disulfide bond. To accommodate the larger 
Val, the side-chain conformation at this position was opti- 
mized using a less restrictive van der Waals scale factor (0.6 
instead of 0.9). The entire structure was then briefly mini- 
mized to relax the strain. The final structure that served as 
the template for all the designs is shown in Figure 1. 

Core designs 

Unlike many experimental sequence screening methods, 
PDA allows control over which residues are allowed to 




Hi Residues JdentfcaJ In bcMrte end human sequence* (82%) 
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Fig. 1. Template structure of hG-CSF used for Protein Design Automation 
(PDA) designs. The human sequence was homology modeled onto the 
bovine crystal structure (PDB record Ibgc). The residues that differ in the 
bovine sequence or were not present in the bovine crystal structure were 
replaced with the residues from the human sequence. The conformations of 
the replaced side chains were optimized using PDA (the larger Val at 
position 167 was optimized using a less restrictive van der Waals scale 
factor), and the entire structure was energy minimized for 50 steps. 



change. Core residues were selected because optimization 
of these positions can improve stability yet minimize 
changes to the molecular surface, thus limiting the designed 
protein's potential for antigenicity. Ala scanning studies of 
G-CSF indicate one or two binding sites on the protein 
surface that are probably responsible for granulopoietic ac- 
tivity (Reidhaar-Olson et al. 1996; Young et al. 1997) (Fig. 
I ). Although recent crystallographic studies of G-CSF com- 
plexed to its receptor show only one binding site in a novel 
2:2 complex (Horan et al. 1996; Aritomi et al. 1999), both 
sites were avoided in the core designs to ensure preservation 
of function. 

Two PDA design calculations were run: a deep core de- 
sign that included residues deeply buried in the interior of 
the protein and an expanded core design (exp_core) that 
also included less buried peripheral core residues. The deep 
core design had 26 core positions that were allowed to vary 
(shown yellow and gold in Fig. 2), whereas exp_core had 34 
(shown yellow and turquoise in Fig. 2). Only hydrophobic 
amino acids were considered at the variable core positions. 
These included Ala, Val, He, Leu, Phe, Tyr, and Trp. Gly 
was also allowed for the variable positions that had Gly in 
the bovine wild-type structure (positions 28, 149, 150, and 
167). Met and Pro were not allowed. 

Optimal sequences 

The optimal sequences selected by PDA are also shown in 
Figure 2. The optimal sequence from the deep core design 
had 10 mutations (named core 10), and the optimal exp_core 
sequence had II (named exp_core!l); thus, 33%-38% of 
the variable residues changed their identities. Eight of the 
mutated positions changed to the same amino acid in both 
designs. Changing the set of design positions can signifi- 
cantly impact the amino acid selected at a given position. 
For example, in the deep core design, Leu89 retains the 
same amino-acid identity and conformation as wild type. 
However, in the exp_core design, when Leu92 is also al- 
lowed to vary, both positions (Leu89 and Leu92) mutate to 
Phe, indicating a coupling between these two core residues. 
The modeled structure of the sequence selected in the deep 
core design (corelO) is shown in Figure 3. 

Native human G-CSF (met hG-CSF) and the optimal se- 
quence from each of the core designs were cloned, ex- 
pressed in Escherichia coli, and purified for experimental 
studies. 

Thermal stability 

The far-ultraviolet (UV) circular dichroism (CD) spectra for 
met hG-CSF and the designed proteins were nearly identical 
to each other and to published spectra for met hG-CSF 
(Reidhaar-Olson et al. 1996; Young et al. 1997), indicating 
highly similar secondary structure and tertiary folds (data 



1220 Protein Science, vol. 11 



Computational stabilization of cytokines 



human GCSF 
bovine GCSF 

corel 0 

corol4_VX57A 
eicp_corell 
core2 
coro8 



T m (°C) 1 10 20 30 



73 
61 
58 
65 
70 



A 
L 



a ] : ? b 

A 

A 

A 

A 




human GCSF 



70 



80 



bo vino GCSF 


S S R r- N >, j G ft \ ■! 


corelO 


*! ! : 1 


core!4JV167A 


U F -\ r 


©xp_corell 


' , ;F \ F 


cor«2 




coreS 


F F 




100 



110 



120 



PTU5Tl^IJn^FATTIWQ0 



I L • 



human GCSF 
bovine GCSF 

corelO 

corel4_V167A 
exp_corell 
core2 
core 8 



130 



140 



150 



160 




170 



MEELGM fcpAI£PT<X5^ A|fc 

DA V W ftf r : \ 0 HR : LA % \ Y E 




I I 



□ Positions variable In all 3 designs (deep core, exp_core, and core 167V) 

Positions variable in deep core design alone 
S Positions variable In exp.core design alone 
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Fig. 2. Sequences of hG-CSF analogs. Native human and bovine sequences are shown at the top. The fragments missing in the crystal 
structure of the human sequence are shown boxed. Variable positions are colored. The deep core design had 26 variable positions. 
cxp_core had 34, and core 1 67 V had 25. The optimal sequence from each design is shown. Letters indicate core residues that mutated 
relative to native hG-CSF; blanks indicate no change. Positions that changed to the same amino acid in all three core designs are 
indicated in bold. Core2 and core8 sequences were not obtained from PDA calculations but were derived by reverting some of the 
corelO mutations to wild type. Melting temperatures (T m s) obtained for the designed proteins are also shown. 



not shown). Thermal denaturation was monitored at 222 
nm, and the melting temperatures (T m s) were derived from 
the derivative curve of the ellipticity at 222 nm versus tem- 
perature (Fig. 4). Thermal denaturation of G-CSF and its 
variants is irreversible; however, T m can be used to quickly 
assess the relative stability of different mutants. Stability 
under storage conditions, which is more relevant clinically, 
was evaluated with shelf-life studies (see below). 

The T m for met hG-CSF was 60°C, identical to that re- 
ported in other studies (Kolvenbach et al. 1 997). CorelO 
showed an increase in stability of 13°C, whereas the T m of 
exp_corel 1 was very similar to wild type (Fig. 2 and Fig. 4). 
The increased stability seen with corelO may be attributable 
to improved packing interactions and optimized hydropho- 
bic burial of side chains. Other possibilities include de- 
creased aggregation resulting from elimination of the free 



cysteine at position 17. The Gly to Ala mutation at position 
28 caused a significant improvement in helical propensity 
that could also be the source of the improved stability. 

Identifying critical mutations using derived sequences 

To differentiate between these possibilities, two additional 
sequences derived from the corelO mutant sequence were 
made and their T m s measured. One of these (core8) was 
identical to corelO except that two mutations distant from 
the others were reverted to wild type (L103V and VI 101). 
These were the two positions that did not mutate in 
exp_corel I. The T m of core8 was 70°C, similar to corelO. 
indicating that the mutations at 103 and 110 were not re- 
sponsible for corelO' s improved stability. 
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Fig. 3. Modeled structure of hG-CSF analog (core 10) obtained from deep 
core design. Twenty-six core residues were allowed to vary: computational 
screening with PDA resulted in 10 mutations: C17L, G2SA, L78F, Y85F. 
L103V, VI 101. Fl I3L, V15II. V153I, and L168F. 

To determine the importance of the other mutations, an- 
other sequence was made (core2) that contained only two of 
the core 10 mutations, G28A and CJ7A; al) other residues 



were identical to wild type (Fig. 2). The T m of core2 was 
5°C higher than wild type, indicating that improvements in 
helical propensity and the elimination of a free cysteine are 
important for heightened thermostability. The remainder of 
the increase in T m seen for core 10 may be attributable to 
improved packing interactions and increased hydrophobic 
burial. 



Storage stability 

Increased shelf life is important for distribution and storage 
and is a desirable feature for G-CSF and other protein drugs. 
Because aggregation and chemical degradation are the pre- 
dominant mechanisms of inactivation of G-CSF (Herman et 
al. 1996), shelf life was estimated by incubating the proteins 
at elevated temperature and then using size-exclusion chro- 
matography to observe the disappearance of monomelic 
protein. Chemical degradation was estimated using reverse 
phase chromatography (data not shown). Core2 and core 10 
showed five- and 10-fold improvements in storage stability, 
respectively, at 50°C (Fig. 5). Rate constants were deter- 
mined by a first order exponential fit of the fraction mono- 
mer remaining/time curves using KaleidaGraph (Synergy 
Software). 

Biological activity 

Granulopoietic activity was determined in vitro by quanti- 
tating cell proliferation as a function of protein concentra- 
tion in murine lymphoid cells transfected with the gene for 
the human G-CSF receptor. The designed proteins were as 
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Fig. 4. Thermal stability of hG-CSF analogs. Thermal stability was as- 
sessed by monitoring the temperature dependence of the circular dichroism 
spectral signal at 222 nm. Melting temperatures (T m s) were derived from 
the derivative curve of the ellipticity at 222 nm versus temperature. Core 10 
and core2 showed increases in T m of 13°C and 5°C, respectively, over 
native met hG-CSF. 




0 2 4 6 8 10 12 14 
Incubation time (days) 



Fig. 5. Shelf life of hG-CSF analogs. Shelf life was estimated by incubat- 
ing the proteins at elevated temperature (50°C) and using size exclusion 
chromatography to observe disappearance of monomeric protein. Rate con- 
stants were determined by a first order exponential Fit of the fraction 
monomer remaining/time curves. Core2 and core 10 showed five- and 10- 
fold improvements in storage stability, respectively, over met hG-CSF 
controls. 
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Fig. 6. In vivo granulopoietic activity of hG-CSF analogs. Mice were 
rendered neutropenic with a single intraperitoneal injection of 200 mg/kg 
cyclophosphamide (CPA). Beginning 24 h later and for 4 consecutive days, 
the mice were given a daily intravenous injection of 100 u»g/kg of native 
. hG-CSF (filgrastim, Amgen), an hG-CSF analog, or saline. On day 5, 
granulopoietic activity was determined by counting the number of white 
blood cells and polymorphonuclear neutrophils (PMN). The designed ana- 
logs (core8 and core 10) were as effective as controls in eliciting a granu- 
lopoietic response. 

active as wild-type hG-CSF (data not shown). The designed 
analogs were also as effective as wild type in increasing 
white blood cell and polymorphonuclear neutrophil levels in 
the neutropenic mouse (Fig. 6). Neutropenia, characterized 
by an abnormally low level of neutrophils in the blood, was 
induced by injection of cyclophosphamide. Reversal of this 
effect by the designed analogs shows that granulopoietic 
activity was also retained in vivo. 

Pharmacokinetics 

The pharmacokinetics of core 10 and native hG-CSF (fil- 
grastim, Amgen) was studied in cynomolgus monkeys after 
a single subcutaneous or intravenous injection of 5 jxg/kg 
and after daily subcutaneous injections of 5 u-g/kg for 28 d. 
Analysis of the serum concentration-time curves shows that 
subcutaneous injection of the designed analog results in 
greater systemic exposure (area under concentration-time 
curve, AUC) than the same dose of wild-type hG-CSF (Fig. 
7B). This was true after a single dose on day 1 (78.8 vs. 54.6 
ng-h/mL, data not shown), as well as on day 28 (37.2 vs. 
17.4 ng-h/mL). There were no measurable differences in 
serum half-life. In the intravenous study, however, the half- 
life of core 10 was three-fold shorter (1 vs. 3 h), and the 
AUC was significantly less (54.7 vs. 117.4 ng-h/mL), indi- 
cating that core 10 is cleared faster (Fig. 7 A). Taken to- 
gether, these data indicate that the designed analog is ab- 
sorbed more quickly from the subcutaneous compartment 
(absorption could not be measured directly given the small 
number of data points at early times). Improved absorption 
may be attributable to decreased aggregation or association 
of the designed protein. The increased monomer lifetime 
and decreased aggregation seen in our shelf-life studies and 



the improved thermal stability of the native conformation 
observed for core 10 indicate a decrease in aggregation in 
the subcutaneous compartment. This possibility is sup- 
ported by the fact that other protein therapeutics engineered 
for reduced aggregation also show faster absorption rates. 
For example, insulin Lispro and other rapid-acting insulin 
analogs that were designed to decrease their tendency to 
self-associate are absorbed faster than regular insulin after 
subcutaneous injection (Howey et al. 1994; Home et al. 
1999). 
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Fig. 7. Pharmacokinetics of hG-CSF analogs. Plasma concentrations of a 
designed hG-CSF analog or wild-type hG-CSF (filgrastim, Amgen) were 
determined after administration in cynomolgus monkeys. (A) Animals were 
given a single intravenous injection of 5 (xg/kg or (B) daily subcutaneous 
injections of 5 u,g/kg for 28 d. Noncompartmental analysis of the serum 
concentration-time curves shows that subcutaneous injections of the core 1 0 
analog resulted in greater systemic exposure (area under concentration- 
time curve, AUC) than the same dose of wild-type hG-CSF. whereas there 
was no change in serum half-life (t (/; ). In the intravenous study, the AUC 
was significantly less and the t (/2 three-fold shorter, indicating that core 10 
was cleared faster. 
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Comparison to published G-CSF variants 

In vitro and cassette mutagenesis studies have shown that 
alterations of the N-terminal region of G-CSF can lead to 
improved granulopoietic activity (Kuga et al. 1989; Okabe 
et al. 1990). Point mutations at Cysl7 have also been found 
to affect shelf life; replacement with Ala led to an increase, 
Ser had no effect, and large residues (lie, Tyr, Arg) led to a 
decrease (Ishikawa et al. 1992). In contrast, our core 10 se- 
quence, which has a large residue (Leu) at this position, 
showed an improved shelf life. This may be explained by 
the observation that in a Cysl7Leu point mutant, Leu's side 
chain would clash with the aromatic ring of the nearby Phe 
at position 1 13. This steric clash does not occur in core 10, 
however, because the Phe at 1 1 3 is replaced by Leu and, in 
compensation for this change, two nearby Leu's become 
Phe's (at positions 78 and 168). Thus, multiple mutations 
allow complementary repacking of the hydrophobic core in 
the core 10 mutant and may be responsible for its enhanced 
stability and shelf life. 

Significant improvements in thermal stability were also 
observed when the seven helical Gly residues in G-CSF 
were replaced with Ala to form point, double, and triple 
mutants (Bishop et al. 2001). Substitutions at positions 26, 
28, 149, and 150 were the most effective. The investigators 
attributed the stabilizing effect to the enhancement in ct-he- 
lical propensity associated with the Gly/Ala substitutions. 
These data support our suggestion that the heightened ther- 
mal stability seen with our mutants (which also contain a 
Gly/Ala substitution at position 28) is at least in part attrib- 
utable to an improvement in helical propensity. 

Probing the robustness of PDA with 
a homology modeled core position 

As pointed out previously, the homology modeling of hu- 
man G-CSF onto the bovine structure was straightforward 
for the most part because the replaced residues were prima- 
rily solvent exposed and no rearrangement of the backbone 
was necessary. The change at one core position, however, 
G167V, induced a steric clash and energy minimization of 
the entire protein was used to relieve the strain. We decided 
to assess the impact of this manipulation by doing an addi- 
tional design (core 167V) in which the variable residues 
were essentially the same as in the deep core design except 
that position 167 was also allowed to vary. We found that 
Vail 67 mutated to Ala (the other mutations were essentially 
the same as for corelO). To probe the plasticity of the core, 
instead of using this PDA optimal sequence, which only had 
two mutations in this region, we ran experiments on another 
high-scoring sequence (corel4_V167A) that had additional 
mutations (14 total, including L157L F160W, and L161F). 
This sequence was chosen because it balanced an extensive 
number of mutations with a relatively high design score. 




Although it ranked 21st in the sequence energy list and was 
2 kcal/mole less favorable than the optimal sequence, it was 
still biologically active and as stable as wild type (T m of 
61°C) (Figs. 2, 4). This indicates that optimization with 
PDA is fairly robust, and that the protein core can be quite 
plastic and can accommodate large changes without sacri- 
ficing stability or function. 

Conclusions 

PDA is a powerful ultrahigh throughput computational 
screening method. Its ability to screen up to 10 80 sequences 
and allow multiple simultaneous mutations significantly in- 
creases the likelihood of finding new and improved pro- 
teins. In this study, PDA was used to develop improved 
analogs for a therapeutically important protein, hG-CSF. 
The novel proteins showed enhanced thermal stabilities and 
shelf life while retaining biological activity. Analysis of the 
mutants and results obtained with derived sequences indi- 
cates that the heightened stability is attributable to improve- 
ments in helical propensity and the elimination of a free 
cysteine; improved core packing and optimized hydropho- 
bic burial of side chains may also be important. Pharmaco- 
kinetic studies indicate that subcutaneous injection of the 
most stable variant results in greater systemic exposure, 
probably attributable to improved absorption from the sub- 
cutaneous compartment. 

These results show that PDA can be successfully applied 
to proteins of therapeutic interest. They also illustrate the 
value of its precise control over the site and type of muta- 
tions, allowing for the rational design of desired properties 
such as improved stability and pharmacokinetics and the 
elimination of undesirable ones such as toxicity and antige- 
nicity. These features are particularly important in the de- 
sign of therapeutic proteins. PDA thus has great potential as 
a powerful in silico tool for therapeutic protein design. 

Materials and methods 

Template structure preparation 

The template structure for the designed proteins was produced by 
homology modeling using the crystal structure of bovine G-CSF 
(Brookhavcn Protein Data Bank code lbgc) as the starting point. 
The program BIOGRAF (Molecular Simulations Inc.. San Diego, 
CA) was used to generate explicit hydrogens on the structure, 
which was then minimized for 50 steps using the conjugate gra- 
dient method and the Dreiding II force field (Mayo et al. 1990). 
The residues that differ in the bovine sequence or were not present 
in the bovine crystal structure were replaced with the human resi- 
dues for those positions. The confomiations of the replaced side 
chains were optimized using PDA (Dahiyat and Mayo 1997a,b), 
and the entire structure was minimized again for 50 steps. This 
minimized structure was used as the template for all the designs. 
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Protein design 

Analogs of hG-CSF were designed by simultaneously optimizing 
residues in the buried core of the protein using PDA. The compu- 
tational details, residue classification, potential functions, and pa- 
rameters used for van der Waals interactions, solvation, and hy- 
drogen bonding are described in previous work (Dahiyat and Mayo 
1996, 1997a). An expanded version of the backbone-dependent 
rotamer library of Dunbrack and Karplus (Dunbrack and Karplus 
1993) was used in all the calculations. The global optimum se- 
quence from each design was selected for characterization and 
experimental testing, except for core 167 V in which the 21st ranked 
sequence was used. Calculations were generally performed over- 
night using 16 processors of an SGI Origin 2000 with 32 R10000 
processors running at 195 MHz. The length of the runs varied from 
1 to several hours of CPU time. 



Cloning and expression 

A gene for met hG-CSF was synthesized from partially overlap- 
ping oligonucleotides (-100 bases) that were extended and PCR 
amplified. Codon usage was optimized for E. coli and several 
restriction sites were incorporated to ease future cloning. These 
partial genes were cloned into a vector and transformed into £. coli 
for sequencing. Several of these gene fragments were then cloned 
into adjacent positions in an expression vector (pETI7 or pET21) 
to form the full-length gene for met hG-CSF (528 bases) and 
transformed into E. coli for expression. Protein was expressed in E. 
coli in insoluble inclusion bodies and its identity was confirmed by 
immunoblot of SDS-PAGE using a commercial mAb against 
hG-CSF. 



Refolding, purification, and storage 

The protein inclusion bodies were solubilized in detergent and 
refolded in the presence of CuS0 4 to promote formation of native 
disulfide bonds (Lu et al. 1992). A size-exclusion column (10 
mm x 300 mm loaded with Superdex prep 75 resin purchased from 
Pharmacia) was loaded with protein and eluted at a flow rate of 0.8 
mL/min using the column buffer (100 mM Na 2 S0 4 . 50 mM Tris, 
pH 7.5). The peaks were monitored at dual wavelengths of 214 nm 
and 280 nm. Albumin, carbonic anhydrate, cytochrome C, and 
aprotinin were used to calibrate the molecular size of proteins 
versus elution time. The monomelic peak that e lutes around the 
expected elution time for each protein was collected and the buffer 
was exchanged into 10 mM NaOAc at pH 4 for biophysical char- 
acterization. For long-term storage, a buffer of 5% sorbitol. 
0.004% Tween 80, and 10 mM NaOAc at pH 4 was used. A pH of 
4 was chosen for these buffers to be consistent with the commer- 
cial formulation of hG-CSF (Am gen), which was used as a control. 
The proteins were >98% pure as judged by reversed phase high 
performance liquid chromatography (HPLC) on a C4 column (3,9 
mm x 150 mm) with a linear acetonitrile-water gradient containing 
0.1% TFE. The identities of all proteins were confirmed by com- 
paring the molecular mass measured by mass spectrometry with 
corresponding molecular mass calculated using the protein se- 
quences. 



Spectroscopic characterization 

Protein samples were 50 u.M in 50 mM sodium phosphate at pH 
5.5. Concentrations were determined using U V spectrophotometry. 
Protein structure was assessed by CD. CD spectra were measured 



on an Aviv 202DS spectrometer equipped with a Peltier tempera- 
ture control unit using a 1-mm path length cell. Thermal stability 
was assessed by monitoring the temperature dependence of the CD 
signal at 222 nm (Kolvenbach et al. 1997). A buffer of 10 mM 
NaOAc was used at pH 4.0 and data were collected every 2.5°C 
with an averaging time of 5 sec and an equilibration time of 3 min. 
Thermal denaturation curves were smoothed using KaleidaGraph. 
The melting temperature (T m ) of each protein was derived from 
the derivative curve of the ellipticity at 222 nm versus temperature. 
The T m values were reproducible to within 2°C for the same pro- 
tein at the concentrations used. 



Storage stability 

The storage stability of the designed proteins was assessed by 
incubation at both 37°C and 50°C under solution conditions iden- 
tical to that used in the commercial formulation of hG-CSF (fil- 
grastim, Amgcn). Because aggregation and chemical degradation 
are the predominant mechanisms of inactivation of G-CSF (Her- 
man et al. 1996), accelerated degradation was followed by observ- 
ing the disappearance of monomelic protein with both size-exclu- 
sion and reverse-phase chromatography. Rate constants for shelf- 
life estimation were determined by a first-order exponential fit of 
the fraction monomer remaining/time curves using KaleidaGraph 
(Synergy Software). 



Cell proliferation assay 

Granulopoietic activity was measured by quantifying cell prolif- 
eration as a function of protein concentration using Ba/F3 (murine 
lymphoid) cells stably transfected with the gene encoding the hu- 
man Class 1 G-CSF receptor (Avalos et al. 1995). Cell prolifera- 
tion was detected by 5-bromo-2'-deoxyuridine (BrdU) incorpora- 
tion quantified by a BrdU-specific ELISA kit (Boehringer Mann- 
heim). 



In vivo biological activity 

Granulopoietic activity was determined in the neutropenic mouse 
(Hattori et al. 1990). C57BL/6 mice were rendered neutropenic 
with a single intraperitoneal injection of 200 mg/kg cyclophospha- 
mide (CPA). Beginning 24 h later and for 4 consecutive days, the 
mice were given a daily intravenous injection of 100 u.g/kg of an 
hG-CSF analog, met hG-CSF produced in our laboratory, clini- 
cally available hG-CSF (filgrastim, Amgen), or saline. On day 5, 
6 h after the final dose, the animals were killed, blood samples 
were collected, and granulopoietic activity was determined by 
counting the number of white blood cells and polymorphonuclear 
neutrophils. 



Pharmacokinetics 

Plasma concentrations of a designed hG-CSF analog or wild-type 
hG-CSF (filgrastim, Amgen) were determined following adminis- 
tration in cynomolgus monkeys. Animals were given a single in- 
travenous injection of 5 u.g/kg or daily subcutaneous injections of 
5 u,g/kg for 28 d. In the intravenous study, blood samples were 
collected at 0 (predose), 5, 15, and 30 min and 1, 2, 4. 6, 8, 12, and 
24 h postdosing. In the subcutaneous studies, blood samples were 
collected at 0 (predose), 1, 2, 4, 6, 8. 12, and 24 h postdosing on 
day 1 and day 28. All samples were immediately placed on wet ice 
and centrifuged at 28°C. The resultant plasma was then frozen and 
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stored (-70°C). Plasma concentrations were determined using an 
enzyme-linked immunosorbent assay (Quantikine human G-CSF 
ELISA, R&D Systems, Minneapolis, MN), performed per manu- 
facturers instructions except that samples were diluted in PBS, 5% 
nonfat dry milk, and 0.05% Tween 20. and the incubation was 
extended to overnight at 4°C. Plasma concentrations of the de- 
signed hG-CSF analog and filgrastim were estimated from their 
corresponding standard curves. Pharmacokinetic parameters were 
calculated by noncompartmental analysis. The terminal slope (\z) 
was estimated by linear regression through the last time points of 
the log concentration versus time curves and used to calculate the 
terminal half-life (t 1/2 ). The area under the curve from time of 
dosing through the last time point (AUC^J was calculated by the 
linear trapezoid method. 
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Pending Claims for A-66103-1 (09/285,912) as of December 12, 2001 



47. (New) A molecular library comprising a plurality of members, each member comprising a 
recombinant nucleic acid, wherein each of said members comprises a fusion nucleic acid 
comprising, from 5* to 3': 

i) a first nucleic acid encoding a first dimerization peptide; 

ii) a second nucleic acid encoding a random peptide; and 

iii) a third nucleic acid encoding a second dimerization peptide; 
wherein each of said random peptides is different. 

48. (New) A molecular library according to claim 47, wherein at least one said dimerization 
peptide is FLIVK. 

49. (New) A molecular library according to claim 47, wherein at least one said dimerization 
peptide is KFLIVKS. 

50. (New) A molecular library according to claim 47, wherein at least one said dimerization 
peptide is FLIVE. 

51 . (New) A molecular library according to claim 47, wherein said first dimerization peptide is 
FLIVK and said second dimerization peptide is FLIVE. 

52. (New) A cellular library comprising a plurality of cells, each cell comprising a member of 
the molecular library of claim 47.-- 
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